智能论文笔记

A GNN-RNN Approach for Harnessing Geospatial and Temporal Information: Application to Crop Yield Prediction

Joshua Fan , Junwen Bai , Zhiyun Li , Ariel Ortiz-Bobea , Carla P. Gomes

分类：机器学习

2021-11-17

气候变化对作物相关的疑虑构成了新的挑战，包括粮食不安全，供应稳定和经济规划。作为中央挑战之一，作物产量预测已成为机器学习领域的按压任务。尽管重要的是，预测任务是特别的复杂性，因为作物产量取决于天气，陆地，土壤质量等各种因素，以及它们的相互作用。近年来，在该域中成功应用了机器学习模型。然而，这些模型要么将他们的任务限制为相对较小的区域，或者只在单个或几年内进行研究，这使得它们难以在空间和时间上概括。在本文中，我们介绍了一种用于作物产量预测的新型图形的复发性神经网络，以纳入模型中的地理和时间知识，进一步提升预测力。我们的方法是在美国大陆的41个州的2000年历史上进行培训，验证和测试，从1981年到2019年覆盖了几年。据我们所知，这是第一种机器学习方法，可在作物产量预测中嵌入地理知识预测全国县级的作物产量。我们还通过应用众所周知的线性模型，基于树的模型，深度学习方法以及比较它们的性能来对与其他机器学习基线进行稳固的基础。实验表明，我们的提出方法始终如一地优于各种指标上现有的现有方法，验证地理空间和时间信息的有效性。

translated by 谷歌翻译

Generalizability of Deep Adult Lung Segmentation Models to the Pediatric Population: A Retrospective Study

Sivaramakrishnan Rajaraman , Feng Yang , Ghada Zamzmi , Zhiyun Xue , Sameer Antani

分类：计算机视觉

2022-11-04

Lung segmentation in chest X-rays (CXRs) is an important prerequisite for improving the specificity of diagnoses of cardiopulmonary diseases in a clinical decision support system. Current deep learning (DL) models for lung segmentation are trained and evaluated on CXR datasets in which the radiographic projections are captured predominantly from the adult population. However, the shape of the lungs is reported to be significantly different for pediatrics across the developmental stages from infancy to adulthood. This might result in age-related data domain shifts that would adversely impact lung segmentation performance when the models trained on the adult population are deployed for pediatric lung segmentation. In this work, our goal is to analyze the generalizability of deep adult lung segmentation models to the pediatric population and improve performance through a systematic combinatorial approach consisting of CXR modality-specific weight initializations, stacked generalization, and an ensemble of the stacked generalization models. Novel evaluation metrics consisting of Mean Lung Contour Distance and Average Hash Score are proposed in addition to the Multi-scale Structural Similarity Index Measure, Intersection of Union, and Dice metrics to evaluate segmentation performance. We observed a significant improvement (p < 0.05) in cross-domain generalization through our combinatorial approach. This study could serve as a paradigm to analyze the cross-domain generalizability of deep segmentation models for other medical imaging modalities and applications.

translated by 谷歌翻译

Deep ensemble learning for segmenting tuberculosis-consistent manifestations in chest radiographs

Sivaramakrishnan Rajaraman , Feng Yang , Ghada Zamzmi , Peng Guo , Zhiyun Xue , Sameer K Antani

分类：计算机视觉

2022-06-13

使用深度学习方法（DL）方法的结核病（TB）自动分割（TB） - 一致的病变（CXR）可以帮助减少放射科医生的努力，补充临床决策，并有可能改善患者治疗。文献中的大多数作品使用粗边界框注释讨论培训自动分割模型。但是，边界框注释的粒度可能导致在像素级别上包含相当一部分假阳性和负面因素，从而可能对整体语义分割性能产生不利影响。这项研究（i）评估了使用TB一致性病变的细粒注释和（ii）U-NET模型变体的培训和构造的好处CXR。我们使用多种集合方法（例如位和位或位，位 - 最大值和堆叠）评估了分割性能。我们观察到，与单个组成模型和其他集合方法相比，堆叠合奏表现出优异的分割性能（骰子得分：0.5743，95％置信区间：（0.4055,0.7431））。据我们所知，这是第一个应用合奏学习来改善细粒度元素一致性病变细分性能的研究。

translated by 谷歌翻译

E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR

W. Ronny Huang , Shuo-yiin Chang , David Rybach , Rohit Prabhavalkar , Tara N. Sainath , Cyril Allauzen , Cal Peyser , Zhiyun Lu

分类：自然语言处理 | 机器学习

2022-04-22

在长时间到数小时的长时间话语中，提高端到端ASR模型的性能是语音识别的持续挑战。一个常见的解决方案是使用单独的语音活动检测器（VAD）事先将音频分割，该声音活动检测器（VAD）纯粹基于声音/非语音信息来决定段边界位置。但是，VAD细分器可能是现实世界语音的最佳选择，例如，一个完整的句子应该整体上可能包含犹豫（“设置... 5点钟的警报”）。我们建议用端到端的ASR模型替换VAD，能够以流方式预测段边界，从而使细分决定不仅在更好的声学特征上，而且还可以在解码文本的语义特征上进行，并具有可忽略的额外功能计算。在现实世界长音频（YouTube）的实验中，长度长达30分钟，我们证明了相对改善的8.5％，并且与VAD段基线相比，中位段延迟潜伏期的中位数延迟延迟减少了250毫秒。 - ART构象体RNN-T模型。

translated by 谷歌翻译

Selective Synthetic Augmentation with HistoGAN for Improved Histopathology Image Classification

Yuan Xue , Jiarong Ye , Qianying Zhou , Rodney Long , Sameer Antani , Zhiyun Xue , Carl Cornwell , Richard Zaino , Keith Cheng , Xiaolei Huang

分类：计算机视觉

2021-11-10

组织病理学分析是对癌前病变诊断的本金标准。从数字图像自动组织病理学分类的目标需要监督培训，这需要大量的专家注释，这可能是昂贵且耗时的收集。同时，精确分类从全幻灯片裁剪的图像斑块对于基于标准滑动窗口的组织病理学幻灯片分类方法是必不可少的。为了减轻这些问题，我们提出了一个精心设计的条件GaN模型，即hostogan，用于在类标签上合成现实组织病理学图像补丁。我们还研究了一种新颖的合成增强框架，可选择地添加由我们提出的HADOGAN生成的新的合成图像补丁，而不是直接扩展与合成图像的训练集。通过基于其指定标签的置信度和实际标记图像的特征相似性选择合成图像，我们的框架为合成增强提供了质量保证。我们的模型在两个数据集上进行评估：具有有限注释的宫颈组织病理学图像数据集，以及具有转移性癌症的淋巴结组织病理学图像的另一个数据集。在这里，我们表明利用具有选择性增强的组织产生的图像导致对宫颈组织病理学和转移性癌症数据集分别的分类性能（分别为6.7％和2.8％）的显着和一致性。

translated by 谷歌翻译

M2: Mixed Models with Preferences, Popularities and Transitions for Next-Basket Recommendation

Bo Peng , Zhiyun Ren , Srinivasan Parthasarathy , Xia Ning

分类：机器学习 | (统计)机器学习

2020-04-03

下一篮子推荐考虑将一组项目推荐到用户将作为一个整体购买的下一个篮子。在本文中，我们为下一个篮子推荐开发了一种具有偏好，普及和转换（M2）的新颖混合模型。该方法在下一个篮子生成过程中模拟了三个重要因素：1）用户在项目中的“全球偏好”，2）项目的“全球受欢迎者和3”的过渡模式。与现有的基于内部网络的方法不同，M2不使用复杂的网络来模拟项目之间的转换，或为用户生成嵌入品。相反，它具有基于简单的编码器解码器的方法（ED-Trans），以更好地模拟项目之间的转换模式。我们将M2与不同组合的组合进行了比较，其中有5个现有的下一篮子推荐方法在4个公共基准数据集上推荐第一个，第二和第三篮子。我们的实验结果表明，M2显着优于所有任务中所有数据集的最先进的方法，提高了高达22.1％。此外，我们的消融研究表明，在推荐性能方面，ED-Trans比经常性神经网络更有效。我们还对下一个篮子推荐评估进行了彻底讨论了各种实验协议和评估指标。

translated by 谷歌翻译

Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection

Junjie Yan , Yingfei Liu , Jianjian Sun , Fan Jia , Shuailin Li , Tiancai Wang , Xiangyu Zhang

分类：计算机视觉

2023-01-03

In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.

translated by 谷歌翻译

Backdoor Attacks Against Dataset Distillation

Yugeng Liu , Zheng Li , Michael Backes , Yun Shen , Yang Zhang

分类：机器学习

2023-01-03

Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.

translated by 谷歌翻译

Language Models are Drummers: Drum Composition with Natural Language Pre-Training

Li Zhang , Chris Callison-Burch

分类：自然语言处理

2023-01-03

Automatic music generation with artificial intelligence typically requires a large amount of data which is hard to obtain for many less common genres and musical instruments. To tackle this issue, we present ongoing work and preliminary findings on the possibility for deep models to transfer knowledge from language to music, by finetuning large language models pre-trained on a massive text corpus on only hundreds of MIDI files of drum performances. We show that by doing so, one of the largest, state-of-the-art models (GPT3) is capable of generating reasonable drum grooves, while models that are not pre-trained (Transformer) shows no such ability beyond naive repetition. Evaluating generated music is a challenging task, more so is evaluating drum grooves with little precedence in literature. Hence, we propose a tailored structural evaluation method and analyze drum grooves produced by GPT3 compared to those played by human professionals, exposing the strengths and weaknesses of such generation by language-to-music transfer. Our findings suggest that language-to-music transfer learning with large language models is viable and promising.

translated by 谷歌翻译

Reference Twice: A Simple and Unified Baseline for Few-Shot Instance Segmentation

Yue Han , Jiangning Zhang , Zhucun Xue , Chao Xu , Xintian Shen , Yabiao Wang , Chengjie Wang , Yong Liu , Xiangtai Li

分类：计算机视觉

2023-01-03

Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.

translated by 谷歌翻译